Emphasized Accent Phrase Prediction from Text for Advertisement Text-To-Speech Synthesis
نویسندگان
چکیده
Realizing expressive text-to-speech synthesis needs both text processing and the rendering of natural expressive speech. This paper focuses on the former as a front-end task in the production of synthetic speech, and investigates a novel method for predicting emphasized accent phrases from advertisement text information. For this purpose, we examine features that can be accurately extracted by text processing based on current Text-tospeech synthesis technologies. Among features, the word surface string of the main content and function words and the part-of-speech of main function words in an accent phrase are found to have higher potential on predicting whether the accent phrase should be emphasized or not through the calculation of mutual information between emphasis label and features of Japanese advertisement sentences. Experiments confirm that emphasized accent phrase prediction using support vector machine (SVM) offers encouraging accuracies for the system which requires emphasized accent phrase locations as context information to improve speech synthesis qualities.
منابع مشابه
Prosody Prediction from Linguistically Enriched Documents Based on a Machine Learning Approach
One of the main aspects in text-to-speech synthesis is the successful prediction of prosodic events. In this work we deal with the prediction of prosodic phrase breaks, accent tones and boundary tones from a linguistically XML-based enriched input (SOLE-ML) produced by a Natural Language Generator (NLG) system. We first extended the original specification of SOLE-ML in order for the NLG to prod...
متن کاملWhich resemblance is useful to predict phrase boundary rise labels for Japanese expressive text-to-speech synthesis, numerically-expressed stylistic or distribution-based semantic?
To establish Expressive Text-to-speech synthesis, current research studies both the processing of input text and the rendering of natural expressive speech. Focusing on the former as a front-end task in the production of synthetic speech, this paper investigates a novel feature for predicting phrase boundary tone labels which transcribe local fundamental frequency (F0) changes frequently appear...
متن کاملCorpus-based Generation of F0 Contours Using Generation Process Model for Emotional Speech Synthesis
A corpus-based method was developed for generating fundamental frequency contours in emotional speech synthesis. The method assumes the generation process model and predicts its command parameters (positions and amplitudes) using binary regression trees with the input of linguistic information of the sentence to be synthesized. Because of the model constraint, a certain quality is still kept in...
متن کاملAccent Sandhi Estimation of Tokyo Dialect of Japanese Using Conditional Random Fields
When synthesizing speech from Japanese text, correct assignment of accent nuclei for input text with arbitrary contents is indispensable in obtaining naturally-sounding synthetic speech. A phenomenon called accent sandhi occurs in utterances of Japanese; when a word is uttered in a sentence, its accent nucleus may change depending on the contexts of preceding/succeeding words. This paper descri...
متن کاملFocus And Accent In A Dutch Text-To-Speech System
In this paper we discuss an algorithm for the assignment of pitch accent positions in text-to-speech conversion. The algorithm is closely modeled on current linoulstic accounts of accent placement, and assumes a surface syntactic analysis of the input. It comprises a small number of heuristic rules for determining which phrases of a sentence are to be focussed upon; the exact location of a pitc...
متن کامل